Low-Income Energy Affordability Data (LEAD)

For low income and moderate income energy policy and program planning, LEAD would be a perfect dataset based on which to conduct analytics. It provides state, county and city level data including number of households at different income levels and numbers of homeowners versus renters, and breaks down based on fuel type, building type, and construction year. as well as average monthly energy expenditures and energy burden.

There are two categories of dataset

  • Area Median Income (AMI)
  • Federal Poverty Level (PFL)

Fields' description:

  • "UNITS": "Number of occupied housing units (or households)"
  • "HINCP": "Average annual household income"
  • "ELEP": "Original American Community Survey average household monthly electricity expenditure"
  • "ELEP CAL": "Calibrated American Community Survey average household monthly electricity expenditure"
  • "GASP": "Original American Community Survey average household monthly gas expenditure"
  • "GASP CAL": "Calibrated American Community Survey average household monthly gas expenditure"
  • "GASP-U CAL": "Calibrated American Community Survey average household monthly gas expenditure, filtered likely utility gas only"
  • "FULP": "Original American Community Survey average housing unit monthly other fuel expenditure"
  • "COUNT": "Average number of ACS survey responses contributing to the estimate of energy expenditures"

For more information, please refer to https://openei.org/doe-opendata/dataset/celica-data

0. Build Database Connection

PyAthena is a Python DB API 2.0 (PEP 249) compliant client for the Amazon Athena JDBC driver. https://github.com/laughingman7743/PyAthena

In [1]:
from pyathena.connection import Connection
from pyathena.pandas_cursor import PandasCursor
In [2]:
AWS_REGION_NAME = "us-west-2"
DATABASE_NAME = "oedidb"
AMI68_SCC_TABLE_NAME = "lead_ami68_state_city_county"
AMI68_TRACT_TABLE_NAME = "lead_ami68_tract"
FPL15_SCC_TABLE_NAME = "lead_fpl15_state_city_county"
FPL15_TRACT_TABLE_NAME = "lead_fpl15_tract"
S3_STAGING_DIR = "s3://nrel-tests/lead-staging"
In [3]:
cursor = Connection(region_name=AWS_REGION_NAME, s3_staging_dir=S3_STAGING_DIR).cursor()
pandas_cursor = Connection(region_name=AWS_REGION_NAME, s3_staging_dir=S3_STAGING_DIR).cursor(PandasCursor)

1. Retrieve Table Metadata

1.1 lead_ami68_state_city_county table

Retrieve the schema

In [4]:
import pandas as pd
In [5]:
# Retrieve schema information
result = cursor.execute(f"DESCRIBE {DATABASE_NAME}.{AMI68_SCC_TABLE_NAME}")
columns = [[item.strip() for item in row[0].split("\t")] for row in result.fetchall()]
pd.DataFrame(columns, columns=["NAME", "TYPE", "FROM"])
Out[5]:
NAME TYPE FROM
0 state string
1 placeid string
2 placename string
3 ybl_index int
4 bld_index int
5 hfl_index int
6 ami68 string
7 units double
8 hincp double
9 elep double
10 elep_cal double
11 gasp double
12 gasp_cal double
13 gasp_u_cal double
14 fulp double
15 count double
16 abv string
17
18 # Partition Information
19 # col_name data_type comment
20
21 abv string
In [6]:
# Retrieve parition information
result = cursor.execute(f"SHOW PARTITIONS {DATABASE_NAME}.{AMI68_SCC_TABLE_NAME}")
for row in result.fetchall():
    print(row)
('abv=CT',)
('abv=ID',)
('abv=UT',)
('abv=OK',)
('abv=SD',)
('abv=NY',)
('abv=CO',)
('abv=NE',)
('abv=WY',)
('abv=IA',)
('abv=AL',)
('abv=AZ',)
('abv=RI',)
('abv=IL',)
('abv=ND',)
('abv=OR',)
('abv=IN',)
('abv=VA',)
('abv=SC',)
('abv=PR',)
('abv=PA',)
('abv=LA',)
('abv=OH',)
('abv=NH',)
('abv=MN',)
('abv=TX',)
('abv=FL',)
('abv=TN',)
('abv=WA',)
('abv=MI',)
('abv=AK',)
('abv=MO',)
('abv=KS',)
('abv=MA',)
('abv=NM',)
('abv=MD',)
('abv=MS',)
('abv=NV',)
('abv=CA',)
('abv=AR',)
('abv=HI',)
('abv=NC',)
('abv=WI',)
('abv=KY',)
('abv=MT',)
('abv=WV',)
('abv=DC',)
('abv=GA',)
('abv=DE',)
('abv=NJ',)
('abv=VT',)
('abv=ME',)

1.2 lead_ami68_tract table

In [7]:
# Retrieve schema information
result = cursor.execute(f"DESCRIBE {DATABASE_NAME}.{AMI68_TRACT_TABLE_NAME}")
columns = [[item.strip() for item in row[0].split("\t")] for row in result.fetchall()]
pd.DataFrame(columns, columns=["NAME", "TYPE", "FROM"])
Out[7]:
NAME TYPE FROM
0 geo_id string
1 puma10 string
2 fmr string
3 ybl_index int
4 bld_index int
5 hfl_index int
6 ami68 string
7 units double
8 hincp double
9 elep double
10 elep_cal double
11 gasp double
12 gasp_cal double
13 gasp_u_cal double
14 fulp double
15 count double
16 abv string
17
18 # Partition Information
19 # col_name data_type comment
20
21 abv string
In [8]:
# Retrieve parition information
result = cursor.execute(f"SHOW PARTITIONS {DATABASE_NAME}.{AMI68_TRACT_TABLE_NAME}")
for row in result.fetchall():
    print(row)
('abv=ID',)
('abv=VA',)
('abv=OK',)
('abv=PR',)
('abv=VT',)
('abv=NH',)
('abv=MS',)
('abv=NY',)
('abv=WA',)
('abv=SD',)
('abv=NE',)
('abv=CO',)
('abv=NV',)
('abv=NM',)
('abv=MA',)
('abv=AK',)
('abv=AZ',)
('abv=MO',)
('abv=DE',)
('abv=IL',)
('abv=HI',)
('abv=CA',)
('abv=MT',)
('abv=SC',)
('abv=AL',)
('abv=ME',)
('abv=KY',)
('abv=AR',)
('abv=ND',)
('abv=OR',)
('abv=TN',)
('abv=WI',)
('abv=UT',)
('abv=KS',)
('abv=IA',)
('abv=MI',)
('abv=OH',)
('abv=PA',)
('abv=NJ',)
('abv=NC',)
('abv=CT',)
('abv=IN',)
('abv=WV',)
('abv=FL',)
('abv=MN',)
('abv=RI',)
('abv=MD',)
('abv=DC',)
('abv=WY',)
('abv=LA',)
('abv=TX',)
('abv=GA',)

1.3 lead_fpl15_state_city_county

In [9]:
# Retrieve schema information
result = cursor.execute(f"DESCRIBE {DATABASE_NAME}.{FPL15_SCC_TABLE_NAME}")
columns = [[item.strip() for item in row[0].split("\t")] for row in result.fetchall()]
pd.DataFrame(columns, columns=["NAME", "TYPE", "FROM"])
Out[9]:
NAME TYPE FROM
0 state string
1 placeid string
2 placename string
3 ybl_index int
4 bld_index int
5 hfl_index int
6 fpl15 string
7 units double
8 hincp double
9 elep double
10 elep_cal double
11 gasp double
12 gasp_cal double
13 gasp_u_cal double
14 fulp double
15 count double
16 abv string
17
18 # Partition Information
19 # col_name data_type comment
20
21 abv string
In [10]:
# Retrieve parition information
result = cursor.execute(f"SHOW PARTITIONS {DATABASE_NAME}.{FPL15_SCC_TABLE_NAME}")
for row in result.fetchall():
    print(row)
('abv=DE',)
('abv=CT',)
('abv=GA',)
('abv=VA',)
('abv=ND',)
('abv=SD',)
('abv=HI',)
('abv=MN',)
('abv=NM',)
('abv=TX',)
('abv=DC',)
('abv=VT',)
('abv=NJ',)
('abv=NV',)
('abv=MO',)
('abv=UT',)
('abv=IL',)
('abv=MS',)
('abv=NC',)
('abv=ME',)
('abv=FL',)
('abv=AZ',)
('abv=OR',)
('abv=TN',)
('abv=AR',)
('abv=IA',)
('abv=WV',)
('abv=IN',)
('abv=ID',)
('abv=NY',)
('abv=LA',)
('abv=KY',)
('abv=MI',)
('abv=MT',)
('abv=MD',)
('abv=WA',)
('abv=CO',)
('abv=RI',)
('abv=AK',)
('abv=CA',)
('abv=SC',)
('abv=PR',)
('abv=MA',)
('abv=OH',)
('abv=NE',)
('abv=NH',)
('abv=WI',)

1.4 lead_fpl15_tract

In [11]:
# Retrieve schema information
result = cursor.execute(f"DESCRIBE {DATABASE_NAME}.{FPL15_TRACT_TABLE_NAME}")
columns = [[item.strip() for item in row[0].split("\t")] for row in result.fetchall()]
pd.DataFrame(columns, columns=["NAME", "TYPE", "FROM"])
Out[11]:
NAME TYPE FROM
0 geo_id string
1 puma10 string
2 fmr string
3 ybl_index int
4 bld_index int
5 hfl_index int
6 fpl15 string
7 units double
8 hincp double
9 elep double
10 elep_cal double
11 gasp double
12 gasp_cal double
13 gasp_u_cal double
14 fulp double
15 count double
16 abv string
17
18 # Partition Information
19 # col_name data_type comment
20
21 abv string
In [12]:
# Retrieve parition information
result = cursor.execute(f"SHOW PARTITIONS {DATABASE_NAME}.{FPL15_TRACT_TABLE_NAME}")
for row in result.fetchall():
    print(row)
('abv=NY',)
('abv=WI',)
('abv=CA',)
('abv=WV',)
('abv=TN',)
('abv=LA',)
('abv=CO',)
('abv=WY',)
('abv=NH',)
('abv=DC',)
('abv=PR',)
('abv=SC',)
('abv=SD',)
('abv=HI',)
('abv=NV',)
('abv=FL',)
('abv=ID',)
('abv=VA',)
('abv=RI',)
('abv=CT',)
('abv=ME',)
('abv=IN',)
('abv=OR',)
('abv=AZ',)
('abv=MD',)
('abv=MO',)
('abv=AK',)
('abv=MT',)
('abv=NE',)
('abv=IL',)
('abv=NJ',)
('abv=KY',)
('abv=TX',)
('abv=OH',)
('abv=MI',)
('abv=OK',)
('abv=PA',)
('abv=UT',)
('abv=MN',)
('abv=NM',)
('abv=DE',)
('abv=WA',)
('abv=MA',)
('abv=ND',)
('abv=AL',)
('abv=NC',)
('abv=GA',)
('abv=MS',)
('abv=VT',)
('abv=KS',)
('abv=AR',)
('abv=IA',)

2. AMI Facts

2.1 State Level

To visualize the dataset column balues on the map

In [13]:
import geopandas
from branca import colormap
from branca.element import Template, MacroElement
import folium
from ipywidgets import interact
In [14]:
STATES = ('AL', 'AK', 'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'FL', 'GA', 'HI', 'ID', 'IL', 'IN', 'IA', 'KS', 'KY', 'LA', 'ME', 'MD', 'MA', 'MI', 'MN', 'MS', 'MO', 'MT', 'NE', 'NV', 'NH', 'NJ', 'NM', 'NY', 'NC', 'ND', 'OH', 'OK', 'OR', 'PA', 'RI', 'SC', 'SD', 'TN', 'TX', 'UT', 'VT', 'VA', 'WA', 'WV', 'WI', 'WY')
In [15]:
ami68_state = pandas_cursor.execute(
    f"""
    SELECT abv,
        SUM(units) as units, 
        SUM(hincp) as hincp, 
        SUM(elep) as elep, 
        SUM(elep_cal) as elep_cal, 
        SUM(gasp) as gasp, 
        SUM(gasp_cal) as gasp_cal,
        SUM(gasp_u_cal) as gasp_u_cal, 
        SUM(fulp) as fulp, 
        SUM(count) as count
    FROM {DATABASE_NAME}.{AMI68_SCC_TABLE_NAME}
    WHERE placename IN {STATES}
    GROUP BY abv
    """
).as_pandas()
# 
ami68_state.head()
Out[15]:
abv units hincp elep elep_cal gasp gasp_cal gasp_u_cal fulp count
0 OR 1.545745e+06 8.089511e+07 174201.749732 147791.961001 49019.925947 36903.864837 24868.950729 20846.609839 6000.147371
1 IA 1.242641e+06 6.839298e+07 158177.514155 136707.471518 59103.156357 44307.527297 27930.658208 14305.411432 4931.829907
2 SD 3.334901e+05 5.294489e+07 127644.140771 108354.412483 46369.289961 34567.501329 20269.495005 12876.966012 3577.577090
3 ME 5.511090e+05 6.179851e+07 132971.238255 112545.048109 59999.454266 29303.966699 11407.448759 59612.537469 4656.967464
4 NM 7.625510e+05 5.695097e+07 130553.104108 95564.721205 69823.797362 40134.554960 28493.337413 18897.938088 5229.046366
In [16]:
geo_state = geopandas.read_file("LEAD/us-states.geojson")[["id", "geometry"]]
geo_state.rename(columns={"id": "abv"}, inplace=True)
geo_ami68_state = geopandas.GeoDataFrame(
    data=ami68_state.merge(geo_state, on="abv"),
    geometry="geometry",
    crs={"init": "epsg:4326"}
)
geo_ami68_state.head()
Out[16]:
abv units hincp elep elep_cal gasp gasp_cal gasp_u_cal fulp count geometry
0 OR 1.545745e+06 8.089511e+07 174201.749732 147791.961001 49019.925947 36903.864837 24868.950729 20846.609839 6000.147371 POLYGON ((-121.441509309489 41.99433436657502,...
1 IA 1.242641e+06 6.839298e+07 158177.514155 136707.471518 59103.156357 44307.527297 27930.658208 14305.411432 4931.829907 POLYGON ((-91.12013228125002 40.70544336537464...
2 SD 3.334901e+05 5.294489e+07 127644.140771 108354.412483 46369.289961 34567.501329 20269.495005 12876.966012 3577.577090 POLYGON ((-102.7883842921169 42.99530336750724...
3 ME 5.511090e+05 6.179851e+07 132971.238255 112545.048109 59999.454266 29303.966699 11407.448759 59612.537469 4656.967464 (POLYGON ((-69.77727626137293 44.0741483685119...
4 NM 7.625510e+05 5.695097e+07 130553.104108 95564.721205 69823.797362 40134.554960 28493.337413 18897.938088 5229.046366 POLYGON ((-109.049495297948 32.44204435767875,...
In [17]:
columns = [ 'units', 'hincp', 'elep', 'elep_cal', 'gasp', 'gasp_cal', 'gasp_u_cal', 'fulp', 'count']
@interact
def show_ami68_state_facts(column=columns):
    
    # Display state pv system numbers on map
    state_map = folium.Map(location=[39.8283, -98.5795], zoom_start=4, tiles="OpenStreetMap")

    tooltip = folium.GeoJsonTooltip(
        fields=["abv", column],
        aliases=["state: ", column + ":"],
        labels=True,
        sticky=False
    )

    colors = ["#ffffcc", "#fed976", "#feb24c", "#fd8d3c", "#e31a1c",  "#800026"]
    max_value, min_value = geo_ami68_state[column].max(), geo_ami68_state[column].min()
    bins = [min_value + i * (max_value - min_value) / 6 for i in range(7)]
    colorscale = colormap.StepColormap(
        colors=colors,
        index=bins,
        vmin=0,
        vmax=1000000
    )
    def style_function(feature):
        return {
            "color": "#000000",
            "weight": 0.2,
            "opacity": 0.6,
            "fillColor": colorscale(feature["properties"][column]),
            "fillOpacity": 0.4,
        }

    folium.GeoJson(
        name="AMI State Data Facts",
        data=geo_ami68_state.to_json(),
        tooltip=tooltip,
        style_function=style_function
    ).add_to(state_map)

    legend = MacroElement()
    with open("LEAD/map_legend.html") as f:
        template = f.read()
        template = template.replace(">0 - 100", ">{} - {}".format(int(bins[0]), int(bins[1])))
        template = template.replace(">100 - 500", ">{} - {}".format(int(bins[1]), int(bins[2])))
        template = template.replace(">500 - 1000", ">{} - {}".format(int(bins[2]), int(bins[3])))
        template = template.replace(">1000 - 5000", ">{} - {}".format(int(bins[3]), int(bins[4])))
        template = template.replace(">5000 - 10000", ">{} - {}".format(int(bins[4]), int(bins[5])))
        template = template.replace(">10000 - 50000", ">{} - {}".format(int(bins[5]), int(bins[6])))
    legend._template = Template(template)
    state_map.get_root().add_child(legend)

    display(state_map)
In [18]:
column = "elep"
# Display state pv system numbers on map
state_map = folium.Map(location=[39.8283, -98.5795], zoom_start=4, tiles="OpenStreetMap")

tooltip = folium.GeoJsonTooltip(
    fields=["abv", column],
    aliases=["state: ", column + ":"],
    labels=True,
    sticky=False
)

colors = ["#ffffcc", "#fed976", "#feb24c", "#fd8d3c", "#e31a1c",  "#800026"]
max_value, min_value = geo_ami68_state[column].max(), geo_ami68_state[column].min()
bins = [min_value + i * (max_value - min_value) / 6 for i in range(7)]
colorscale = colormap.StepColormap(
    colors=colors,
    index=bins,
    vmin=0,
    vmax=1000000
)
def style_function(feature):
    return {
        "color": "#000000",
        "weight": 0.2,
        "opacity": 0.6,
        "fillColor": colorscale(feature["properties"][column]),
        "fillOpacity": 0.4,
    }

folium.GeoJson(
    name="AMI State Data Facts",
    data=geo_ami68_state.to_json(),
    tooltip=tooltip,
    style_function=style_function
).add_to(state_map)

legend = MacroElement()
with open("LEAD/map_legend.html") as f:
    template = f.read()
    template = template.replace(">0 - 100", ">{} - {}".format(int(bins[0]), int(bins[1])))
    template = template.replace(">100 - 500", ">{} - {}".format(int(bins[1]), int(bins[2])))
    template = template.replace(">500 - 1000", ">{} - {}".format(int(bins[2]), int(bins[3])))
    template = template.replace(">1000 - 5000", ">{} - {}".format(int(bins[3]), int(bins[4])))
    template = template.replace(">5000 - 10000", ">{} - {}".format(int(bins[4]), int(bins[5])))
    template = template.replace(">10000 - 50000", ">{} - {}".format(int(bins[5]), int(bins[6])))
legend._template = Template(template)
state_map.get_root().add_child(legend)

display(state_map)

2.2 Tract Level

Hierarchical clustering is a general family of clustering algorithms that build nested clusters by merging or splitting them successively. For more details, please refer to https://scikit-learn.org/stable/modules/clustering.html#hierarchical-clustering

Take CO tracts as an example, to find tracts with similar characteristics on energy consumption.

In [19]:
import scipy.cluster.hierarchy as sch
from sklearn.cluster import AgglomerativeClustering
import matplotlib.pyplot as plt
%matplotlib inline
In [20]:
ami68_co_tract = pandas_cursor.execute(
    f"""
    SELECT geo_id,
        SUM(units) as units, 
        SUM(hincp) as hincp, 
        SUM(elep) as elep, 
        SUM(elep_cal) as elep_cal, 
        SUM(gasp) as gasp, 
        SUM(gasp_cal) as gasp_cal,
        SUM(gasp_u_cal) as gasp_u_cal, 
        SUM(fulp) as fulp, 
        SUM(count) as count
    FROM {DATABASE_NAME}.{AMI68_TRACT_TABLE_NAME} 
    WHERE abv = 'CO'
    GROUP BY geo_id
    """
).as_pandas()
ami68_co_tract.head()
Out[20]:
geo_id units hincp elep elep_cal gasp gasp_cal gasp_u_cal fulp count
0 8013012508 1337.0 1.230347e+07 15589.473177 13407.657429 7073.008668 5815.221933 5127.843435 1260.307693 1885.000000
1 8041004603 2211.0 1.302889e+07 33468.214676 28860.622082 23409.322213 17610.971302 9371.117327 4266.330931 1757.493333
2 8041004702 849.0 4.316919e+06 8828.201361 7473.548139 3661.574511 2795.356489 2748.971876 125.219086 1429.000000
3 8041004705 3099.0 8.733552e+06 17782.173017 16534.698451 8050.068605 6267.053515 5519.895662 648.365075 1662.000000
4 8059009842 1243.0 5.265632e+06 8508.912100 7013.023894 4069.524244 3185.327114 3185.327114 110.407669 2021.223108
In [21]:
points = ami68_co_tract.drop(columns=["geo_id"]).to_numpy()
labels = ami68_co_tract["geo_id"].values
In [22]:
fig = plt.figure(figsize=(80, 600))
# create dendrogram
dendrogram = sch.dendrogram(sch.linkage(points, method="ward"), orientation="right", leaf_font_size=20, labels=labels)
# create clusters
hc = AgglomerativeClustering(n_clusters=4, affinity ="euclidean", linkage="ward")
# save clusters for chart
hc.fit_predict(points)

plt.show()

3. FPL Facts

3.1 State Level

Show FPL data facts on Map

In [23]:
fpl15_state = pandas_cursor.execute(
    f"""
    SELECT abv,
        SUM(units) as units, 
        SUM(hincp) as hincp, 
        SUM(elep) as elep, 
        SUM(elep_cal) as elep_cal, 
        SUM(gasp) as gasp, 
        SUM(gasp_cal) as gasp_cal,
        SUM(gasp_u_cal) as gasp_u_cal, 
        SUM(fulp) as fulp, 
        SUM(count) as count
    FROM {DATABASE_NAME}.{FPL15_SCC_TABLE_NAME}
    WHERE placename IN {STATES}
    GROUP BY abv
    """
).as_pandas()
# 
fpl15_state.head()
Out[23]:
abv units hincp elep elep_cal gasp gasp_cal gasp_u_cal fulp count
0 WA 2.696606e+06 9.138398e+07 190182.971932 148293.869768 58181.966686 47098.501837 30646.560465 21697.908613 6782.187641
1 RI 4.102086e+05 6.434302e+07 138544.550498 142201.066504 73605.777119 59232.487645 42297.047641 31691.905808 5007.353427
2 CA 1.280739e+07 1.536711e+08 243052.300200 224048.290401 92850.508585 81006.999088 60041.530600 21160.558921 6891.729401
3 NH 5.213730e+05 7.852272e+07 171074.330057 161106.752649 85108.562318 46509.375574 26330.278895 55511.787128 5029.589316
4 UT 9.183670e+05 6.392366e+07 120017.115136 99501.251527 65541.547202 55685.801818 42489.304690 10251.834275 5110.979628
In [24]:
geo_state = geopandas.read_file("LEAD/us-states.geojson")[["id", "geometry"]]
geo_state.rename(columns={"id": "abv"}, inplace=True)
geo_fpl15_state = geopandas.GeoDataFrame(
    data=fpl15_state.merge(geo_state, on="abv"),
    geometry="geometry",
    crs={"init": "epsg:4326"}
)
In [25]:
columns = [ 'units', 'hincp', 'elep', 'elep_cal', 'gasp', 'gasp_cal', 'gasp_u_cal', 'fulp', 'count']
@interact
def show_fpl15_state_facts(column=columns):
    
    # Display state pv system numbers on map
    state_map = folium.Map(location=[39.8283, -98.5795], zoom_start=4, tiles="OpenStreetMap")

    tooltip = folium.GeoJsonTooltip(
        fields=["abv", column],
        aliases=["state: ", column + ":"],
        labels=True,
        sticky=False
    )

    colors = ["#ffffcc", "#fed976", "#feb24c", "#fd8d3c", "#e31a1c",  "#800026"]
    max_value, min_value = geo_fpl15_state[column].max(), geo_fpl15_state[column].min()
    bins = [min_value + i * (max_value - min_value) / 6 for i in range(7)]
    colorscale = colormap.StepColormap(
        colors=colors,
        index=bins,
        vmin=0,
        vmax=1000000
    )
    def style_function(feature):
        return {
            "color": "#000000",
            "weight": 0.2,
            "opacity": 0.6,
            "fillColor": colorscale(feature["properties"][column]),
            "fillOpacity": 0.4,
        }

    folium.GeoJson(
        name="FPL State Data Facts",
        data=geo_fpl15_state.to_json(),
        tooltip=tooltip,
        style_function=style_function
    ).add_to(state_map)

    legend = MacroElement()
    with open("LEAD/map_legend.html") as f:
        template = f.read()
        template = template.replace(">0 - 100", ">{} - {}".format(int(bins[0]), int(bins[1])))
        template = template.replace(">100 - 500", ">{} - {}".format(int(bins[1]), int(bins[2])))
        template = template.replace(">500 - 1000", ">{} - {}".format(int(bins[2]), int(bins[3])))
        template = template.replace(">1000 - 5000", ">{} - {}".format(int(bins[3]), int(bins[4])))
        template = template.replace(">5000 - 10000", ">{} - {}".format(int(bins[4]), int(bins[5])))
        template = template.replace(">10000 - 50000", ">{} - {}".format(int(bins[5]), int(bins[6])))
    legend._template = Template(template)
    state_map.get_root().add_child(legend)

    display(state_map)
In [27]:
column = "elep" 
# Display state pv system numbers on map
state_map = folium.Map(location=[39.8283, -98.5795], zoom_start=4, tiles="OpenStreetMap")

tooltip = folium.GeoJsonTooltip(
    fields=["abv", column],
    aliases=["state: ", column + ":"],
    labels=True,
    sticky=False
)

colors = ["#ffffcc", "#fed976", "#feb24c", "#fd8d3c", "#e31a1c",  "#800026"]
max_value, min_value = geo_fpl15_state[column].max(), geo_fpl15_state[column].min()
bins = [min_value + i * (max_value - min_value) / 6 for i in range(7)]
colorscale = colormap.StepColormap(
    colors=colors,
    index=bins,
    vmin=0,
    vmax=1000000
)
def style_function(feature):
    return {
        "color": "#000000",
        "weight": 0.2,
        "opacity": 0.6,
        "fillColor": colorscale(feature["properties"][column]),
        "fillOpacity": 0.4,
    }

folium.GeoJson(
    name="FPL State Data Facts",
    data=geo_fpl15_state.to_json(),
    tooltip=tooltip,
    style_function=style_function
).add_to(state_map)

legend = MacroElement()
with open("LEAD/map_legend.html") as f:
    template = f.read()
    template = template.replace(">0 - 100", ">{} - {}".format(int(bins[0]), int(bins[1])))
    template = template.replace(">100 - 500", ">{} - {}".format(int(bins[1]), int(bins[2])))
    template = template.replace(">500 - 1000", ">{} - {}".format(int(bins[2]), int(bins[3])))
    template = template.replace(">1000 - 5000", ">{} - {}".format(int(bins[3]), int(bins[4])))
    template = template.replace(">5000 - 10000", ">{} - {}".format(int(bins[4]), int(bins[5])))
    template = template.replace(">10000 - 50000", ">{} - {}".format(int(bins[5]), int(bins[6])))
legend._template = Template(template)
state_map.get_root().add_child(legend)

state_map
Out[27]:

3.2 Tract Level

Spatial Autocorrelation - Attribute Similarity Measurements.
Take CO tracts as an example (Notes - there are some multipolygons which were removed in this analysis example.)

In [28]:
import mapclassify as mc
import pysal
from pysal.lib import weights
from pysal.viz.splot import libpysal
/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/pysal/explore/segregation/network/network.py:16: UserWarning: You need pandana and urbanaccess to work with segregation's network module
You can install them with  `pip install urbanaccess pandana` or `conda install -c udst pandana urbanaccess`
  "You need pandana and urbanaccess to work with segregation's network module\n"
/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/pysal/model/spvcm/abstracts.py:10: UserWarning: The `dill` module is required to use the sqlite backend fully.
  from .sqlite import head_to_sql, start_sql
In [29]:
fpl15_co_tract = pandas_cursor.execute(
    f"""
    SELECT geo_id,
        SUM(units) as units, 
        SUM(hincp) as hincp, 
        SUM(elep) as elep, 
        SUM(elep_cal) as elep_cal, 
        SUM(gasp) as gasp, 
        SUM(gasp_cal) as gasp_cal,
        SUM(gasp_u_cal) as gasp_u_cal, 
        SUM(fulp) as fulp, 
        SUM(count) as count
    FROM {DATABASE_NAME}.{FPL15_TRACT_TABLE_NAME} 
    WHERE abv = 'CO'
    GROUP BY geo_id
    """
).as_pandas()
fpl15_co_tract.loc[:, "geo_id"] = ["0"+geo_id for geo_id in fpl15_co_tract["geo_id"].values]
fpl15_co_tract.head()
Out[29]:
geo_id units hincp elep elep_cal gasp gasp_cal gasp_u_cal fulp count
0 08001008536 895.0 7.352250e+06 13685.026818 13159.290663 7014.507309 4698.619585 3963.931859 445.084398 1669.000000
1 08001009203 1490.0 5.556064e+06 13289.657442 13079.264005 5040.612453 4975.596613 4432.989174 108.851644 806.000000
2 08005005552 1086.0 7.389023e+06 15588.223567 13068.859792 6822.994178 5908.467703 4857.702665 304.792787 2015.000000
3 08005006704 1533.0 3.469648e+06 5244.997954 4411.610001 2845.346474 2231.459982 2231.459982 168.421400 1752.447733
4 08005082900 1576.0 6.639607e+06 13000.647949 10207.044761 7366.489054 5044.841726 3812.936400 176.968563 1471.000000
In [30]:
geo_co_tract = geopandas.read_file("LEAD/co_tracts.geojson")
geo_co_tract.rename(columns={"FIPS": "geo_id"}, inplace=True)
geo_co_tract.drop(columns=["OBJECTID"], inplace=True)
In [31]:
fpl15_geo_co_tract = geopandas.GeoDataFrame(
    data=fpl15_co_tract.merge(geo_co_tract, on="geo_id"),
    geometry="geometry",
    crs={"init": "epsg:4326"}
)
fpl15_geo_co_tract.head()
Out[31]:
geo_id units hincp elep elep_cal gasp gasp_cal gasp_u_cal fulp count geometry
0 08001008536 895.0 7.352250e+06 13685.026818 13159.290663 7014.507309 4698.619585 3963.931859 445.084398 1669.000000 POLYGON ((-104.806563 39.914313, -104.804247 3...
1 08001009203 1490.0 5.556064e+06 13289.657442 13079.264005 5040.612453 4975.596613 4432.989174 108.851644 806.000000 POLYGON ((-104.977688999 39.872704, -104.97769...
2 08005005552 1086.0 7.389023e+06 15588.223567 13068.859792 6822.994178 5908.467703 4857.702665 304.792787 2015.000000 POLYGON ((-105.034171 39.65329, -105.034017 39...
3 08005006704 1533.0 3.469648e+06 5244.997954 4411.610001 2845.346474 2231.459982 2231.459982 168.421400 1752.447733 POLYGON ((-104.959678 39.632565, -104.959683 3...
4 08005082900 1576.0 6.639607e+06 13000.647949 10207.044761 7366.489054 5044.841726 3812.936400 176.968563 1471.000000 POLYGON ((-104.769973 39.696558, -104.769547 3...
In [32]:
w = weights.contiguity.Queen.from_dataframe(fpl15_geo_co_tract)
w.transform = "r"
In [33]:
libpysal.plot_spatial_weights(w, fpl15_geo_co_tract,  figsize=(30, 30))
Out[33]:
(<Figure size 2160x2160 with 1 Axes>,
 <matplotlib.axes._subplots.AxesSubplot at 0x7f0967ec8630>)
In [34]:
y = fpl15_geo_co_tract["elep"]
ylag = weights.lag_spatial(w, y)
In [35]:
ylagq5 = mc.Quantiles(ylag, k=5)
fig, ax = plt.subplots(1, figsize=(30, 30))
fpl15_geo_co_tract.assign(cl=ylagq5.yb).plot(column="cl", categorical=True, k=5, cmap="GnBu", linewidth=0.1, ax=ax, edgecolor="white", legend=True)
ax.set_axis_off()
plt.title("Spatial Lag Median ELEP Price (Quintiles)")

plt.show()